{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 下载 llama.cpp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "source /etc/network_turbo\n",
    "\n",
    "git clone https://github.com/ggerganov/llama.cpp\n",
    "\n",
    "cd llama.cpp\n",
    "\n",
    "pip install -r requirements.txt\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 配置llama.cpp环境"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "mkdir build\n",
    "\n",
    "sudo apt-get update\n",
    "\n",
    "\n",
    "sudo apt-get install make cmake gcc g++ locate\n",
    "\n",
    "cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_ENABLE_UNIFIED_MEMORY=1 -DLLAMA_CURL=OFF\n",
    "\n",
    "cmake --build build --config Release -j8\n",
    "\n",
    "cd build\n",
    "\n",
    "make install\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 获取gguf格式的模型（参考上一个视频）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 如果不量化，保留模型的效果\n",
    "python convert_hf_to_gguf.py /root/autodl-tmp/Qwen/Qwen3-8B  --outtype f16 --verbose --outfile /root/autodl-tmp/Qwen/qwen3_8b_f16.gguf\n",
    "#### 如果需要量化（加速并有损效果），直接执行下面脚本就可以\n",
    "python convert_hf_to_gguf.py /root/autodl-tmp/Qwen/Qwen3-8B  --outtype q8_0 --verbose --outfile /root/autodl-tmp/Qwen/qwen3_8b_q8_0.gguf\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 与模型对话"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "./bin/llama-cli -m /root/autodl-tmp/Qwen/qwen3_8b_q8_0.gguf -cnv --n-gpu-layers 2000\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 开放api接口"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-ts:指定n张gpu的分配比例\n",
    "./bin/llama-server -m /root/autodl-tmp/Qwen/qwen3_8b_q8_0.gguf --port 6006 --n-gpu-layers 2000 -ts 1,1\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## openai服务"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<think>\n",
      "好的，用户问我叫什么名字。我需要按照设定来回答。根据之前的设定，我的名字是通义千问，但用户可能更喜欢叫我小通。不过有时候用户可能直接叫我通义千问，所以需要确认一下。\n",
      "\n",
      "首先，用户可能只是想确认我的名字，或者他们可能对我的名字有疑问。我需要保持友好和自然的语气。应该先直接回答名字，然后提供一些额外的信息，比如我的功能或用途，这样用户会更清楚我的作用。\n",
      "\n",
      "另外，用户可能是在测试我的反应，或者他们之前听说过我，想确认是否是同一个模型。因此，我需要确保回答准确且信息丰富。同时，保持回答简洁，避免过于冗长。\n",
      "\n",
      "还要注意用户可能的后续问题，比如询问我的功能或如何使用。所以，在回答中可以适当引导用户提出更多问题，比如询问他们需要什么帮助。这样可以促进进一步的互动。\n",
      "\n",
      "最后，检查是否有任何可能的误解，比如用户可能混淆了不同的模型名称，或者对我的功能有疑问。需要确保回答清晰，没有歧义。\n",
      "</think>\n",
      "\n",
      "我的名字是通义千问，你可以叫我小通。我是通义实验室开发的超大规模语言模型，能够帮助你回答问题、创作文字、编程、分析数据等等。有什么需要帮助的吗？😊\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from openai import OpenAI\n",
    "\n",
    "# client = OpenAI(\n",
    "#     # 若没有配置环境变量，请用百炼API Key将下行替换为：api_key=\"sk-xxx\",\n",
    "#     api_key=os.getenv(\"DASHSCOPE_API_KEY\"), # 如何获取API Key：https://help.aliyun.com/zh/model-studio/developer-reference/get-api-key\n",
    "#     base_url=\"https://dashscope.aliyuncs.com/compatible-mode/v1\",\n",
    "# )\n",
    "\n",
    "client = OpenAI(\n",
    "   \n",
    "    api_key=\"na\", \n",
    "    base_url=\"http://localhost:8000/v1\",\n",
    ")\n",
    "\n",
    "\n",
    "completion = client.chat.completions.create(\n",
    "    model=\"na\", \n",
    "    messages=[\n",
    "        {'role': 'system', 'content': 'You are a helpful assistant.'},\n",
    "        {'role': 'user', 'content': '你叫什么名字'}\n",
    "        ]\n",
    ")\n",
    "print(completion.choices[0].message.content)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "ame",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
